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Abstract: Maintenance practices have long focused on time based “preventive 
maintenance techniques. Components were changed out and parts replaced based on 
how long they had been in place instead of what condition they were in. A reliability 
centered maintenance (RCM) program seeks to offer equal or greater reliability at 
decreased cost by insuring only applicable, effective maintenance is performed and by in 
large part replacing time based maintenance with condition based maintenance. A 
significant portion of this program involved introducing non-intrusive technologies, such 
as vibration analysis, oil analysis and I/R cameras, to an existing labor force and 
management team. 

This paper discusses what is involved in an RCM program and how EG&G is 
implementing it at Kennedy Space Center on the facilities maintenance program. It 
discusses technical tools, management tools and people issues involved in achieving the 
goal of “better, faster, cheaper” in the facilities arena. 
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The maintenance program is an integrated, closed loop, continuous improvement process 
that includes life cycle maintenance planning, asset risk assessment, runtime, calendar & 
condition based maintenance, outage coordination, facility condition assessment and cost 
accounting. The maintenance program is proactive in nature, reliability centered and is a 
true asset management program. Program effectiveness is measured in terms of asset 
availability, reliability and life cycle cost. 


An essential element in the program is the computerized maintenance management 
system (CMMS) with the capability to interface electronically with subject matter 
specific software such as predictive maintenance software programs for vibration 
analysis. The software provides the traditional productivity and maintenance cost reports 
as well as asset condition and maintenance requirements reports. It generates work orders 
based on asset condition triggers and time based or usage based preplanned frequencies. 
The asset inventory, with pertinent data including risk codes and RCM analysis 
information, is contained in the CMMS. This enables Maintenance Engineers to trend 
equipment failures for further analysis, and is the means of continually improving the 
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effectiveness of the assigned levels of maintenance associated with an asset or definable 
group of assets. 

Components The Maintenance Program is a closed loop process that ensures continuous 
program improvement. The first functional component cf the process is an accurate 
inventory of assets included in the maintenance program It is critical to know what is 
being maintained and have it accurately identified in the CMMS. 


Life Cycle Planning ensures the function of the assets is clearly defined, understood and 
documented and maintenance requirements are planned for the designed life of the asset. 
This occurs during the design process for new assets and is documented taking into 
account such things as ease of access to components, minimization of special tooling, 
incorporation of data for predictive maintenance condition trending, etc. Consideration is 
also given to the expected life of materials specified in the design and program 
maintenance requirements resulting from expiration of the materials useful life (i.e. 
repainting structures on a 7-8 year cycle, replacing roofing systems on 20 year cycles, 
etc.). The more routine recurring maintenance including preventive tasks (service, 
inspections and minor repair) and predictive testing will be identified utilizing the RCM 
methodology. For existing assets, this takes place during (he RCM analysis. 


Once the asset inventory is established and entered into the computerized information 
system and the function of the individual or defined group of assets is clearly understood 
and documented a risk assessment is performed. The risk assessment of the impact of a 
loss of function of the asset is performed to determine the appropriate asset risk category. 
Assets fall within four basic risk categories (high, medium, low or negligible) based on 
the lack of ability to support mission or the cost involved should there be a loss of asset 
function. This risk assessment is the first step in developing maintenance requirements 
under an RCM methodology. 

A significant component of the program in terms of cost ef fectiveness is the methodology 
for determining maintenance requirements. The RCM philosophy is a departure from 
traditional methods of determining maintenance requirements. RCM logically 
incorporates the most effective mix of reactive, preventive, predictive and proactive 
maintenance practices and draws on their respective streng hs. RCM applies the four 
maintenance practices where each is most appropriate based on the consequences of 
failure and the resulting impact to mission. This combination produces optimum 
reliability at minimum maintenance cost and the combined benefits far exceed those 
resulting from using any one maintenance practice. RCM incorporates the principle that 
any maintenance task performed must be proven to be applicable and effective. 

Applicable implies that, of the competing tasks, the selected task is the most cost 
effective option. Effective means that the performance of the task will prevent, mitigate 
or detect the onset of a failure or discover a hidden failure that has already occurred. 


During an RCM analysis, engineers use a decision logic tree to assign the proper mix of 
maintenance. Figure 1. 
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HIGH RISK 
•Loss of life/serious injury 
•Loss of shuttle/payload 
•Loss/damage of a shuttle system 
or major assembly/compooent 


MODERATE RISK 

•Major operational, 
environmental, or 
political consequences * 
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incident? 
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LOW RISK 
-Minor operational, 
environmental, or 
political consequences 
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Failure falls into risk 
Category D. No PM 
is recommended 


APPLICABLE - Task wilt prevent, mitigate or detect the onset of a 
failure or discover a hidden failure that has already 
occurred.. 


EFFECTIVE. 


Among competing candidates, the selected task is the 
most cost effective option. 


NEGLIGIBLE RISK 
-Inconvenience 


This decision logic tree focuses on sustaining the reliability of assets in support of a 
defined mission. The RCM analysis is structured to implement the principle that no 
maintenance task will be performed unless it is justified. The criteria for justification are 
safety, reliability and cost effectiveness in deferring or preventing a specific failure mode. 
Because RCM is reliability based, statistical analysis and conditional probabilities of 
failure are important in determining the consequences of failure. The primary objective is 
to maintain the inherent reliability designed into the asset. The product of the RCM 
analysis is work procedures for both preventive and predictive maintenance that are 
captured in the CMMS. The performance schedule is also generated in the CMMS as a 
basis for initiating preventive/predictive maintenance. 


The next program component, Facility Condition Assessment (FCA). is important to 
maintenance engineers and managers as it provides feedback on asset maintenance 
effectiveness. The FCA is an asset inspection and engineering analysis of maintenance 
history, failure trends, any root cause failure analysis that might have been performed and 
any open or planned work requirements. The purpose of the FCA is to validate 





maintenance requirements identified during life cycle planning, review and revise the 
effectiveness of the assigned mix of predictive, preventive and reactive maintenance, 
identify any new asset deficiencies that may have been detected during the assessment 
process, review planned maintenance work and review energy issues, if applicable. 


Another important part of the FCA is validating the mission of the asset. Program 
requirements changes many times drive asset mission changes. When mission changes 
occur, the level of assigned maintenance may require adjustment due to changes in asset 
criticality. We perform FCAs on a five year cycle to coincide with the budget cycle. 
Knowing the asset mission, the asset maintenance history, the identified and planned 
maintenance requirements and the current condition of the asset, work can be prioritized 
and programmed for performance over the budget cycle. Existing maintenance 
procedures can be validated and adjusted as required, monitoring programs implemented 
and tests conducted on assets to further evaluate any suspected problems. The FCA 
provides a structured process for validating, justifying and prioritizing maintenance 
requirements. 


An appropriate level of maintenance can not be assigned to an asset unless the 
consequences of failure of that asset are clearly understood. RCM forces focus on the 
product of a system, rather then on individual items within a system. As a result, many 
items which are critical to a system operation are found to have backups or work-arounds 
designed into the system, so a failure or loss of an individual item does not result in a 
system failure. An example of this may be in electric power distribution, where power to 
a specific facility is critical. The loss of the feeder cable will result in no power through 
that cable. It will not result in a power loss to the facility, however, because the facility 
has duel power feed from independent circuits, an emergency backup generator and an 
UPS. The system does not fail, only the component . 


Risk assessment is the first step in determining maintenance levels. Four risk levels have 
been established, based on the consequences of failure; high, medium, low and no risk. 
High and medium risk codes are often associated with catastrophic failures, but because 
of the economic impact costs smaller failures can also fall into this area. If a facility 
suffers a loss of utilities and has no secondary feed (either onsite or portable), the people 
in that building will have to stop work and leave. This "impact cost", different from a 
repair cost, while not obvious to maintainers is real and must be a factor in evaluating the 
risk level. The RCM analysis is structured to implement the principle that no 
maintenance task will be performed unless it can be justified. The criteria for 
justification are safety, reliability, and cost effectiveness in deferring or preventing a 
specific failure mode. Because RCM is reliability based, statistical analysis and 
conditional probabilities of failures are important in determining the consequences of 
failure. The primary objective is to maintain the inherent reliability designed into the 
equipment. Figure 2 graphically ties all the parts together. 
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In this model, it has been determined during the design that the facility is expected to 
support its defined function for 100 years. The roof of the facility, given the climate the 
facility will be subjected to, will require major refurbishment after approximately 20 
years of service. Therefore, as part of the life-cycle plan, a major refurbishment is 
identified 20 years from date of facility activation. In addition, while determining the 
exterior paint specifications, historical data and engineering studies indicate facilities 
require repainting every five years. This, too, is added to the life-cycle plan as identified 
program level maintenance requirements. Infrared thermography is also identified on a 
frequent basis. It is used to perform a condition assessment of the roof and the electrical 
panels throughout the facility in lieu of previously assigned labor intensive PM tasks 
Air filters are replaced on a regularly scheduled basis. 



Predictive Maintenance, also known as Predictive Testing and Inspection (PT&I), can 
determine the condition of the equipment and provide various trend indicators. 
Interpretation of these indicators allows potential functional failure to be forecast so 
corrective maintenance can be performed to preclude failure. Working as a complement 
to the PM program a PT&I program can: 

o Help determine the condition of a component and identify required repairs before 
that component fails. ™ 

o Conserve resources by performing maintenance on an as-required basis rather 
than on a calendar frequency or a run-time basis, 
o Minimize down time. 

The effectiveness of each applicable PT&I test is examined to determine which test or 
combination of tests will be used. Any test by itself may not give a good representation 
of the overall condition of each piece of equipment on the system. However, certain 
combinations will give a very good indication of equipment condition. Comparisons 
with previous tests provides trend data useful in condition assessment analysis. 

Historically, the focus of maintenance has been the Preventive Maintenance (PM) 
program. Electrical and mechanical equipment experience deterioration over time that 
eventually causes it to fail. PM is used to slow this deteric ration, ensuring the 
equipment’s operational life. A properly conducted program reduces overall operating 
costs, aids mission effectiveness, safety, and assures the continued preservation, 
usefulness, and performance of assets. The PM program, coupled with the other 
elements of the overall maintenance program, allows engineers to be aware of equipment 
condition so that sufficient time is available for the system itic planning and scheduling of 
required repair work. 

Preventive maintenance consists of the planned and schediled maintenance tasks that are 
periodically performed on equipment to avoid a breakdown. The frequency is based on 
calendar date, rate of utilization (routine), or condition which is determined by trending 
data collected through the application of PT&I technologies. The PM program consists of 
the following: 

o Inspections of mechanical, electrical and other physical structures, installed 
equipment and systems such as motors, pumps, compresso s, faucets, light switches, etc. 

o Inspections are performed on a periodic, pre-deterrrined basis in aneffort to 
determine the degree of operating efficiency and whether e pjipment deficiencies exist. 

o Routine servicing of equipment including lubrication, cleaning and changing 
filters, minor adjustments and parts replacement, and condition reporting. 

o Formalized evaluation and work generation system which ensures discovered , 
uncorrected deficiencies are entered into the normal planning and scheduling system. 


Run-to-failure is a reactive component because it is based on the premise that no 
maintenance task that improves the reliability of the F/S/E in a cost effective manner has 
been identified. Users call a trouble desk to report breakdowns on run-to-failure items. 
When the corrective action required is beyond the scope of a trouble call, if engineering 
is required, or if material must be ordered, the trouble call is changed to a repair work 
order. As with other work orders, labor, materials and material costs are tracked in a 
CMMS. This information is then sent to a computer history file which can be retrieved 
later for use in F/S/E condition assessments, making repair/replace decisions, failure 
trending, and other engineering analysis. 


Maintenance Effectiveness The effectiveness of the maintenance program must be 
measured and validated. Long term effectiveness is monitored through the facility 
condition assessment while short term effectiveness is determined using failure trending 
analysis, which highlights failure trends on like equipment. This advanced notice gives 
time to take action to prevent catastrophic failures. 

Failure trending codes are developed by maintenance engineers with support from field 
technicians. These codes are used by the technicians in the field to track and classify 
failures and are recorded in the CMMS. The coding structure, coupled with existing 
report filter capabilities, allows a relatively quick analysis of failure data. If a problem is 
suspected, a more detailed analysis is performed. Reports provide information on the 
following elements: I) number of loss-of-function events; 2) cause of loss; 3) disposition 
of cause; and 4) corrective action taken. 


PROGRAM MEASUREMENTS (METRICS) The following metrics are reported to 
measure the progress and of cost effectiveness of the maintenance program. 


a. Equipment Availability 

% = Hours System/Equipment is Available to Run at Capacity 
Total Hours During the Reporting Time Period 

b. Maintenance Overtime Percentage 

o 

% = Total Maintenance Overtime Hours During Period 
Total Regular Maintenance Hours During Period 


Percent of Emergency Work to Routine Work 
% = Total Emergency Hours 
Total Maintenance Hours 



Millions 


d. Percent of Faults Found in Thermographic Survey 

0/0 = Number of Faults Found 

Number of Devices Surveyed 

e. Total cost of maintenance per year 

Figure 3 shows some results obtained by the program as measured by two metrics. 
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Lessons Learned Many of the lessons we learned are available from existing texts, both 
technical and management. It is perhaps inevitable that lessons have to be learned 
individually m order to be understood, and so many of the lessons presented here are of 
an o vious nature. By far our biggest finding was the value of repetition. By definition 
a cycle of continuous improvement implies doing the same thing over and over and 
getting a bit better each time. Implementing a reliability centered maintenance program 
involves changing the way people think and work. Training, explanations, briefings, 
analysis, making changes and tracking results were done on an individual basis shop by 
shop. Selecting a visible, intuitive initial technology is also an important point. Laser 
a ignment was easily demonstrated, learned and understood; vibration monitoring is more 
involved and less readily grasped. I/R cameras are so advanced the operation is simple- 
point and shoot technology allows anyone to actually see the temperature difference 
between a loose connection and a proper one. As we were able to show results, we began 
to build a cadre of supporters who functioned as champions in their own right. 


When we began this project, we went through a developmental phase, an implementation 
phase and are now in an operational mode. It is no longer a phase - we have achieved a 
shift m the way we do business. The very nature of the process ensures it will repeat 
itself over and over - a cycle of continuous improvement. This program is not something 
we do - it is a way of getting things done in an efficient, cost effective and risk 
appropriate manner. 
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